Skip to content

Validation based on hdf tree traversal#333

Merged
lukaspie merged 84 commits intomasterfrom
hdf-based-validation
Aug 21, 2025
Merged

Validation based on hdf tree traversal#333
lukaspie merged 84 commits intomasterfrom
hdf-based-validation

Conversation

@domna
Copy link
Collaborator

@domna domna commented May 17, 2024

This is a new functionality: validation not of the template, but of an existing NeXus HDF5 file.

Generally, it seems it's much more straightforward than the verification in the dict, because we have much less specialities we need to consider. However, it could be that the performance is not as good, because we traverse the hdf each time. But we build a cached recursive function to balance that off.

ToDo:

  • Add documentation

@coveralls
Copy link

coveralls commented May 17, 2024

Pull Request Test Coverage Report for Build 9517068942

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage remained the same at 96.916%

Files with Coverage Reduction New Missed Lines %
tests/dataconverter/test_nexus_tree.py 1 87.23%
Totals Coverage Status
Change from base Build 9516791059: 0.0%
Covered Lines: 597
Relevant Lines: 616

💛 - Coveralls

@domna domna force-pushed the hdf-based-validation branch from 3483bd4 to 4fff6a2 Compare May 17, 2024 11:26
@domna domna force-pushed the hdf-based-validation branch from c040e52 to 9eab38c Compare June 7, 2024 15:09
@domna domna self-assigned this Jun 7, 2024
@domna domna mentioned this pull request Jun 10, 2024
13 tasks
@domna domna mentioned this pull request Jun 28, 2024
@domna domna removed their assignment Jul 24, 2024
@lukaspie lukaspie mentioned this pull request Aug 14, 2024
2 tasks
@RubelMozumder RubelMozumder mentioned this pull request Jan 22, 2025
6 tasks
@lukaspie lukaspie force-pushed the hdf-based-validation branch 4 times, most recently from 62eedbf to ba7fa5b Compare July 30, 2025 14:10
@lukaspie lukaspie force-pushed the hdf-based-validation branch 3 times, most recently from fcee6b3 to f626a83 Compare August 7, 2025 15:03
@lukaspie lukaspie force-pushed the hdf-based-validation branch from d13c475 to 08e9530 Compare August 11, 2025 10:00
Copy link
Collaborator

@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments.

Copy link
Collaborator

@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue of the last comment!

Copy link
Collaborator

@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I found a few warning for NXmpes

WARNING: A link was used for /entry/sample/bias_env/voltmeter, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/gas_pressure_env/pressure_gauge, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/temperature_env/temperature_sensor, but no '@target' attribute was found.

IMO, which should not be a warning because they are recommended fields. But it is not a problem if it does not affect our test.

Also, some unit warning in xps example. Though units are there in the nexus file. Probably the reason you have not used the unit registry from pynxtools.

The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/raw/@units = counts has no documentation.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/cycle0_scan0/@units = counts has no documentation.

Regarding SPM, I will skip the validation. The application definition will be restructured very soon.

Other than that, this PR looks good to me.

@lukaspie
Copy link
Collaborator

Currently, I found a few warning for NXmpes

WARNING: A link was used for /entry/sample/bias_env/voltmeter, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/gas_pressure_env/pressure_gauge, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/temperature_env/temperature_sensor, but no '@target' attribute was found.

IMO, which should not be a warning because they are recommended fields. But it is not a problem if it does not affect our test.

One should write the target attribute when one uses a link, so it is reasonable to give a warning if it is not given in the file. Note that for the template validation, if @target is not present, we add it automatically.

For the tests that produce this warning, we should just regenerate the test files.

Also, some unit warning in xps example. Though units are there in the nexus file. Probably the reason you have not used the unit registry from pynxtools.

The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/raw/@units = counts has no documentation.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/cycle0_scan0/@units = counts has no documentation.

These are somewhat expected because NXdata/AXISNAME does not have any units attached. So the units actually are undocumented, so the log messages are expected.

@lukaspie lukaspie force-pushed the hdf-based-validation branch from ad32358 to 1d5e94b Compare August 14, 2025 15:01
@lukaspie
Copy link
Collaborator

Regarding SPM, I will skip the validation. The application definition will be restructured very soon.

I think in this particular case you do not need to change anything in pynxtools-spm, but it would be great if you could just regenerate the files once again. As you can see here, the tests only fail because we are now automaticall writing a @target attribute whenever links is used and therefore the logs are different. @RubelMozumder Would be great if you could simply update all SPM files accordingly, so that all tests pass here.

@lukaspie lukaspie requested a review from rettigl August 14, 2025 15:38
Copy link
Collaborator

@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
As you mentioned, I created the test files that work fine.
But, the test are only passing for hdf-based-validation branch. So, you can merge it, I have created PR rearding prepared for this PR.

Copy link
Collaborator

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a lot new code, where some of it, in particular in validation.py seem to be sometimes duplicates from existing code. This will make future maintenance rather difficult. Is it possible to move common code to a different place and reuse?

Also, while playing around, I recognized some errors which are not recognized:
This wrong dict:

  "/ENTRY/CALIBRATION[energy_referencing]":{
    "calibrated_axis": "@link:/entry/data",
    "calibrated_axis/@units": "eV",
    "physical_quantity": "energy",
    "reference_peak": "@attrs:metadata/scan_info/reference_energy",
    "binding_energy": 0.0,
    "binding_energy/@units": "eV"
  },

Generates the error
ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group.
during validation, but the link is still being set, and this a units attribute is being written to /entry/data. This is not being picked up neither by the converter nor the validator. The wrong type of /entry/energy_referencing/calibrated_axis is neither reported by the validator.

@lukaspie
Copy link
Collaborator

pynxtools-spm tests are still failing, but they work with FAIRmat-NFDI/pynxtools-spm#43

@RubelMozumder
Copy link
Collaborator

RubelMozumder commented Aug 15, 2025

pynxtools-spm tests are still failing, but they work with FAIRmat-NFDI/pynxtools-spm#43

That PR was prepared for the branch hdf-base-validation. If you merge this PR to the master branch, I can merge that PR as well. Later, you can release a patch version for pynxtools.

@lukaspie
Copy link
Collaborator

lukaspie commented Aug 19, 2025

Also, while playing around, I recognized some errors which are not recognized: This wrong dict:

  "/ENTRY/CALIBRATION[energy_referencing]":{
    "calibrated_axis": "@link:/entry/data",
    "calibrated_axis/@units": "eV",
    "physical_quantity": "energy",
    "reference_peak": "@attrs:metadata/scan_info/reference_energy",
    "binding_energy": 0.0,
    "binding_energy/@units": "eV"
  },

Generates the error ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group. during validation, but the link is still being set, and this a units attribute is being written to /entry/data. This is not being picked up neither by the converter nor the validator. The wrong type of /entry/energy_referencing/calibrated_axis is neither reported by the validator.

Indeed, in #642 (specifically in commit 5941b1f), we decided that such keys shall not be removed, even if they are erroneous. I think this was the wrong decision then and we should remove such keys in order to still produce a valid file. I have implemented with 28371b0 and 372a879 now that such keys that are linked to the wrong NeXus type are removed. In order not to create nonsense in the file, we also need to remove any further subkeys (e.g., additional attributes for fields)

Here's the result for your example:

"/ENTRY/CALIBRATION[energy_referencing]":{
  "calibrated_axis": "@link:/entry/data",
  "calibrated_axis/@units": "eV",
  "physical_quantity": "energy",
  "reference_peak": "@attrs:metadata/scan_info/reference_energy",
  "binding_energy": 0.0,
  "binding_energy/@units": "eV"
},
ERROR: The type ('group') of '/ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis' conflicts with the concept /ENTRY/energy_referencing/calibrated_axis, which is of type 'field'.
ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group.
WARNING: The group /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis will not be written.
WARNING: The attribute /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis/@units will not be written.
WARNING: The attribute /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis/@target will not be written.

And for the other way around

"/ENTRY/CALIBRATION[energy_referencing]":"@link:/entry/data/energy",
ERROR: The type ('field') of '/ENTRY[entry]/CALIBRATION[energy_referencing]' conflicts with the concept /ENTRY/energy_referencing, which is of type 'group'.
ERROR: Expected a group at /ENTRY[entry]/CALIBRATION[energy_referencing], but found a field or attribute.
WARNING: The field /ENTRY[entry]/CALIBRATION[energy_referencing] will not be written.
WARNING: The attribute /ENTRY[entry]/CALIBRATION[energy_referencing]/@target will not be written.
ERROR: The type ('field') of '/ENTRY[entry]/CALIBRATION[energy_referencing]' conflicts with the concept /ENTRY/CALIBRATION, which is of type 'group'.

Note that in the latter case, we actually find two different conflicts (with the named group energy_referencing(NXcalibration) as well as with the unnamed (NXcalibration).

I think this should work nicely and also in the resulting files there's no confusion anymore.

@lukaspie lukaspie force-pushed the hdf-based-validation branch from 7a0b8c0 to 28371b0 Compare August 19, 2025 15:33
@lukaspie lukaspie requested a review from rettigl August 19, 2025 15:33
Copy link
Collaborator

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into my comments. I did not check all changes again, but tested my issues from before, and they seem to be solved. So LGTM.

@lukaspie lukaspie merged commit f1cfba8 into master Aug 21, 2025
15 of 16 checks passed
@lukaspie lukaspie deleted the hdf-based-validation branch August 21, 2025 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants